4 research outputs found
A Novel Variational Lower Bound for Inverse Reinforcement Learning
Inverse reinforcement learning (IRL) seeks to learn the reward function from
expert trajectories, to understand the task for imitation or collaboration
thereby removing the need for manual reward engineering. However, IRL in the
context of large, high-dimensional problems with unknown dynamics has been
particularly challenging. In this paper, we present a new Variational Lower
Bound for IRL (VLB-IRL), which is derived under the framework of a
probabilistic graphical model with an optimality node. Our method
simultaneously learns the reward function and policy under the learned reward
function by maximizing the lower bound, which is equivalent to minimizing the
reverse Kullback-Leibler divergence between an approximated distribution of
optimality given the reward function and the true distribution of optimality
given trajectories. This leads to a new IRL method that learns a valid reward
function such that the policy under the learned reward achieves expert-level
performance on several known domains. Importantly, the method outperforms the
existing state-of-the-art IRL algorithms on these domains by demonstrating
better reward from the learned policy
Adaptive Agent Architecture for Real-time Human-Agent Teaming
Teamwork is a set of interrelated reasoning, actions and behaviors of team
members that facilitate common objectives. Teamwork theory and experiments have
resulted in a set of states and processes for team effectiveness in both
human-human and agent-agent teams. However, human-agent teaming is less well
studied because it is so new and involves asymmetry in policy and intent not
present in human teams. To optimize team performance in human-agent teaming, it
is critical that agents infer human intent and adapt their polices for smooth
coordination. Most literature in human-agent teaming builds agents referencing
a learned human model. Though these agents are guaranteed to perform well with
the learned model, they lay heavy assumptions on human policy such as
optimality and consistency, which is unlikely in many real-world scenarios. In
this paper, we propose a novel adaptive agent architecture in human-model-free
setting on a two-player cooperative game, namely Team Space Fortress (TSF).
Previous human-human team research have shown complementary policies in TSF
game and diversity in human players' skill, which encourages us to relax the
assumptions on human policy. Therefore, we discard learning human models from
human data, and instead use an adaptation strategy on a pre-trained library of
exemplar policies composed of RL algorithms or rule-based methods with minimal
assumptions of human behavior. The adaptation strategy relies on a novel
similarity metric to infer human policy and then selects the most complementary
policy in our library to maximize the team performance. The adaptive agent
architecture can be deployed in real-time and generalize to any off-the-shelf
static agents. We conducted human-agent experiments to evaluate the proposed
adaptive agent framework, and demonstrated the suboptimality, diversity, and
adaptability of human policies in human-agent teams.Comment: The first three authors contributed equally. In AAAI 2021 Workshop on
Plan, Activity, and Intent Recognitio
Aligning Large Multimodal Models with Factually Augmented RLHF
Large Multimodal Models (LMM) are built across modalities and the
misalignment between two modalities can result in "hallucination", generating
textual outputs that are not grounded by the multimodal information in context.
To address the multimodal misalignment issue, we adapt the Reinforcement
Learning from Human Feedback (RLHF) from the text domain to the task of
vision-language alignment, where human annotators are asked to compare two
responses and pinpoint the more hallucinated one, and the vision-language model
is trained to maximize the simulated human rewards. We propose a new alignment
algorithm called Factually Augmented RLHF that augments the reward model with
additional factual information such as image captions and ground-truth
multi-choice options, which alleviates the reward hacking phenomenon in RLHF
and further improves the performance. We also enhance the GPT-4-generated
training data (for vision instruction tuning) with previously available
human-written image-text pairs to improve the general capabilities of our
model. To evaluate the proposed approach in real-world scenarios, we develop a
new evaluation benchmark MMHAL-BENCH with a special focus on penalizing
hallucinations. As the first LMM trained with RLHF, our approach achieves
remarkable improvement on the LLaVA-Bench dataset with the 94% performance
level of the text-only GPT-4 (while previous best methods can only achieve the
87% level), and an improvement by 60% on MMHAL-BENCH over other baselines. We
opensource our code, model, data at https://llava-rlhf.github.io.Comment: Preprin
Individualized Mutual Adaptation in Human-Agent Teams
The ability to collaborate with previously unseen human teammates is crucial for artificial agents to be effective in human-agent teams (HATs). Due to individual differences and complex team dynamics, it is hard to develop a single agent policy to match all potential teammates. In this paper, we study both human-human and humanagent teams in a dyadic cooperative task, Team Space Fortress (TSF). Results show that the team performance is influenced by both players’ individual skill level and their ability to collaborate with different teammates by adopting complementary policies. Based on human-human team results, we propose an adaptive agent that identifies different human policies and assigns a complementary partner policy to optimize team performance. The adaptation method relies on a novel similarity metric to infer human policy and then selects the most complementary policy from a pre-trained library of exemplar policies. We conducted human-agent experiments to evaluate the adaptive agent and examine mutual adaptation in humanagent teams. Results show that both human adaptation and agent adaptation contribute to team performanc